Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
Add filters

Journal
Document Type
Year range
1.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 1-10, 2022.
Article in English | Scopus | ID: covidwho-2290872

ABSTRACT

Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags;however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset extracted from the Arabic Newspaper COVID-19 Corpus (AraNPCC). CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure. © 2022 Association for Computational Linguistics.

2.
Int J Mol Sci ; 23(23)2022 Nov 29.
Article in English | MEDLINE | ID: covidwho-2296973

ABSTRACT

The body of scientific literature continues to grow annually. Over 1.5 million abstracts of biomedical publications were added to the PubMed database in 2021. Therefore, developing cognitive systems that provide a specialized search for information in scientific publications based on subject area ontology and modern artificial intelligence methods is urgently needed. We previously developed a web-based information retrieval system, ANDDigest, designed to search and analyze information in the PubMed database using a customized domain ontology. This paper presents an improved ANDDigest version that uses fine-tuned PubMedBERT classifiers to enhance the quality of short name recognition for molecular-genetics entities in PubMed abstracts on eight biological object types: cell components, diseases, side effects, genes, proteins, pathways, drugs, and metabolites. This approach increased average short name recognition accuracy by 13%.


Subject(s)
Artificial Intelligence , Data Mining , Data Mining/methods , PubMed , Databases, Factual , Proteins
3.
13th IEEE International Conference on Knowledge Graph, ICKG 2022 ; : 56-63, 2022.
Article in English | Scopus | ID: covidwho-2258490

ABSTRACT

While manual analysis of news coverage is difficult and time consuming, methods in natural language processing can be used to uncover otherwise hidden semantics. This work analyses more than 370,000 news articles to explore connections and trends in business decisions and their financial impact during the COVID-19 pandemic. Topic modelling, sentiment analysis and named entity recognition methods are used to identify connections between the articles and the financial performance of selected companies or industries. This report sets out the results of the individual natural language processing methods and the resulting analysis with financial data. Interesting contrasting topics in the media can be filtered out that are associated with the companies with the highest or lowest positive sentiment. This information could be useful to companies to gain an understanding of topics that are currently treated favourably or unfavourably by the media and hence assist with communication strategies and competitive intelligence. © 2022 IEEE.

4.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 5173-5181, 2022.
Article in English | Scopus | ID: covidwho-2248652

ABSTRACT

Clinical Cohort Studies (CCS), such as randomized clinical trials, are a great source of documented clinical research. Ideally, a clinical expert inspects these articles for exploratory analysis ranging from drug discovery for evaluating the efficacy of existing drugs in tackling emerging diseases to the first test of newly developed drugs. However, more than 100 articles are published daily on a single prevalent disease like COVID-19 in PubMed. As a result, it can take days for a physician to find articles and extract relevant information. Can we develop a system to sift through these articles faster and document the crucial takeaways from each of these articles? In this work, we propose CCS Explorer, an end-to-end system for relevance prediction of sentences, extractive summarization, and patient, outcome, and intervention entity detection from CCS. CCS Explorer is packaged in a web-based graphical user interface where the user can provide any disease name. CCS Explorer then extracts and aggregates all relevant information from articles on PubMed based on the results of an automatically generated query produced on the back-end. For each task, CCS Explorer fine-tunes pre-trained language representation models based on transformers with additional layers. The models are evaluated using two publicly available datasets. CCS Explorer obtains a recall of 80.2%, AUC-ROC of 0.843, and an accuracy of 88.3% on sentence relevance prediction using BioBERT and achieves an average Micro F1-Score of 77.8% on Patient, Intervention, Outcome detection (PIO) using PubMedBERT. Thus, CCS Explorer can reliably extract relevant information to summarize articles, saving time by ~660×. © 2022 IEEE.

5.
Expert Systems with Applications ; 223, 2023.
Article in English | Scopus | ID: covidwho-2263399

ABSTRACT

Because of the frequent occurrence of chronic diseases, the COVID-19 pandemic, etc., online health expert question-answering (HQA) services have been unable to cope with the rapidly increasing demand for online consultations. Building a virtual health assistant based on medical named entity recognition (NER) can effectively assist with the consultation process, but the unstandardized expressions within HQA text pose a serious challenge for medical NER tasks. The main goal of this study is to propose a novel deep medical NER approach based on a collaborative decision strategy (CDS), i.e., co_decision_NER (CDN), that can identify standard and nonstandard medical entities in the HQA context. We collected 10,000 question–answer pairs from HaoDF, extracted medical entities from 15 entity categories, and used a CDS to fuse the advantages of different NER models. Ultimately, CDN achieved a performance (precision = 84.50%, recall = 84.30%, F1 = 84.40%) that was significantly better than that of the state-of-the-art (SOTA) method. Our empirical analysis suggests that the entity types Disease (DIS), Sign (SIG), Test (TES), Drug (DRU), Surgery (SUR), Precaution (PRE), and Region (REG) can be most easily expressed arbitrarily in the doctor–patient interaction scenario of HQA services. In addition, CDN can identify not only standard but also nonstandard medical entities, effectively alleviating the severe out-of-vocabulary (OOV) problem faced by HQA services when performing medical NER tasks. The core contribution of this study is the development of a novel neural network model fusion algorithm that can improve the performance of entity recognition in medical domain-specific tasks. © 2023 Elsevier Ltd

6.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2230986

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

7.
2022 IEEE MIT Undergraduate Research Technology Conference, URTC 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2223158

ABSTRACT

This paper presents a named entity recognition system for the specific domain of Vietnamese COVID-19 news articles. By incorporating manually selected and domain-specific features into a simple deep learning architecture, the system can identify a wide range of custom named entities relevant in the context of COVID-19 and future epidemics. Using high-dimensional embedding vectors in combination with part-of-speech tags and additional features, the system achieves an F score of about 90.41%, surpassing or coming close to results by other models that are more complicated or pre-Trained and fine-Tuned. © 2022 IEEE.

8.
2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022 ; : 2274-2280, 2022.
Article in English | Scopus | ID: covidwho-2223066

ABSTRACT

Toward efficient learning of massive publications during the COVID-19 pandemic, we propose a pipeline, Knowledge Extraction for COVID-19 Publications (KEP), that aims at automatic extraction and representation of key knowledge from user-interested publications. The first version, KEP-1.0, has been developed and published on the Python Package Index (PyPI) (URL: https://pypi.org/project/KEP/). In this first release, knowledge about key topics, disease discussions, and location mentions for each publication is provided. KEP-1.0 not only extracts relevant knowledge but, more importantly, emphasizes the top discussed entities and presents visualizable plots, including bar graphs and word clouds. This allows a rapid preliminary understanding of the main discussions in the publication from these three aspects. Moreover, an enhanced TF-IDF algorithm, the weighted TF-IDF, targeting the publication topic identification purpose, has been proposed and evaluated. The pipeline is fully open-sourced and customizable. KEP-1.0 is ready for use in its current form or to be embedded into existing literature platforms. This pipeline is designed for COVID-related publications, but it has the potential to benefit similar knowledge extraction tasks for other topics of interest with a rapidly increasing number of publications. © 2022 IEEE.

9.
Procesamiento Del Lenguaje Natural ; - (69):165-176, 2022.
Article in English | Web of Science | ID: covidwho-2218007

ABSTRACT

Several initiatives have emerged during the COVID-19 pandemic to gather scientific publications related to coronaviruses. Among them, the COVID-19 Open Research Dataset (CORD-19) has proven to be a valuable resource that provides full-text articles from the PubMed Central, bioRxiv and medRxiv repositories. Such a large amount of biomedical literature needs to be properly managed to facilitate and promote its use by health professionals, for example by tagging documents with the biomedical entities that appear on them. We created a biomedical named entity recognizer (NER) that normalizes (NEN) the drugs, diseases, genes and proteins mentioned in texts with the codes of the main standardization systems such as MeSH, ICD-10, ATC, SNOMED, ChEBI, GARD and NCBI. It is based on fine-tuning the BioBERT language model independently for each entity type using domain-specific datasets and an inverse index search to normalize the references. We have used the resultant BioNER+BioNEN system to process the CORD-19 corpus and offer an overview of the drugs, diseases, genes and proteins related to coronaviruses in the last fifty years.

10.
13th International Conference on Computing Communication and Networking Technologies, ICCCNT 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2213233

ABSTRACT

Health literacy is the ability of a person to read and understand medical text and to use that information to make informed healthcare decisions. Unfortunately, medical articles are difficult to comprehend by common people as they use complex language and domain-specific terms. Improving health literacy is important for empowering communities against emerging threats and the COVID-19 pandemic bears testimony to this statement. One way to improve health literacy is easing access to complex healthcare information by summarising medical texts and simplifying them lexically by translating specific medical terminology to laymen's terms. In this paper we propose a system that performs extractive summarization on the medical article given as input followed by named entity recognition for identifying medical terms. The meanings of identified medical entities are then found through web scraping and displayed to the user along with the summary. We have experimented with state-of-the-art summarization models and Albert (A lite BERT) has provided the best ROUGE-1 score of 0.3789 and ROUGE-L of 0.2084. © 2022 IEEE.

11.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 13610 LNCS:23-32, 2022.
Article in English | Scopus | ID: covidwho-2173854

ABSTRACT

Biomedical named entity recognition is becoming increasingly important to biomedical research due to a proliferation of articles and also due to the current pandemic disease. This paper addresses the task of automatically finding and recognizing biomedical entity types related to COVID (e.g., virus, cell, therapeutic) with tolerance rough sets. The task includes i) extracting nouns and their co-occurring contextual patterns from a large BioNER dataset related to COVID-19 and, ii) annotating unlabelled data with a semi-supervised learning algorithm using co-occurence statistics. 465,250 noun phrases and 6,222,196 contextual patterns were extracted from 29,500 articles using natural language text processing methods. Three categories were successfully classified at this time: virus, cell and therapeutic. Early precision@N results demonstrate that our proposed tolerant pattern learner (TPL) is able to constrain concept drift in all 3 categories during the iterative learning process. © 2022, Springer-Verlag GmbH Germany, part of Springer Nature.

12.
19th International Conference on Web Information Systems and Applications, WISA 2022 ; 13579 LNCS:267-279, 2022.
Article in English | Scopus | ID: covidwho-2173751

ABSTRACT

Since the outbreak of the COVID-19 epidemic at the end of 2019, the normalization of epidemic prevention and control has become one of the core tasks of the entire country. Health self-examination by checking the trajectory of diagnosed patients has gradually become everyone's basic necessity and essential to epidemic prevention. The COVID-19 patient's spatio-temporal information helps to facilitate the self-inspection of the masses of whether their trajectory overlaps with the confirmed cases, which promotes the epidemic prevention work. This paper, proposes a named entity recognition model to automatically identify the time and place information in the COVID-19 patient trajectory text. The model consists of an ALBERT layer, a Bi-GRU layer, and a GlobalPointer layer. The previous two layers jointly focus on extracting the context's characteristics and the semantic dependencies. And the GlobalPointer layer extracts the corresponding named entities from a global perspective, which improves the recognition ability for the long-nested place and time entities. Compared to the conventional name entity recognition models, our proposed model has high effectiveness because it has a smaller parameter scale and faster training speed. We evaluate the proposed model using a dataset crawled from the official COVID-19 trajectory text. The F1-score of the model has reached 92.86%, which outperforms four traditional named entity recognition models. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

13.
4th International Conference on Information Systems and Management Science, ISMS 2021 ; 521 LNNS:419-427, 2023.
Article in English | Scopus | ID: covidwho-2173622

ABSTRACT

Entity extraction from the text data in the biomedical domain has an essential role in biomedical research. In natural language processing entity extraction task aims to identify the terms into predefined categories. With the emergence of the covid-19, covid related digital resources increased drastically and the new type of entities is introduced. State-of-the-art named entity extraction models is heavily relying on domain-specific resources which are hard to perform adequately on covid related data. In this paper, we proposed a deep-learning-based architecture for named entity recognition. The experiment was performed on the CORD-NER dataset which was released by the University of Illinois. We compare the performance of different deep learning-based architectures on this data for a named entity recognition task. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

14.
Comput Biol Chem ; 102: 107808, 2023 Feb.
Article in English | MEDLINE | ID: covidwho-2165189

ABSTRACT

The number of biomedical articles published is increasing rapidly over the years. Currently there are about 30 million articles in PubMed and over 25 million mentions in Medline. Among these fundamentals, Biomedical Named Entity Recognition (BioNER) and Biomedical Relation Extraction (BioRE) are the most essential in analysing the literature. In the biomedical domain, Knowledge Graph is used to visualize the relationships between various entities such as proteins, chemicals and diseases. Scientific publications have increased dramatically as a result of the search for treatments and potential cures for the new Coronavirus, but efficiently analysing, integrating, and utilising related sources of information remains a difficulty. In order to effectively combat the disease during pandemics like COVID-19, literature must be used quickly and effectively. In this paper, we introduced a fully automated framework consists of BERT-BiLSTM, Knowledge graph, and Representation Learning model to extract the top diseases, chemicals, and proteins related to COVID-19 from the literature. The proposed framework uses Named Entity Recognition models for disease recognition, chemical recognition, and protein recognition. Then the system uses the Chemical - Disease Relation Extraction and Chemical - Protein Relation Extraction models. And the system extracts the entities and relations from the CORD-19 dataset using the models. The system then creates a Knowledge Graph for the extracted relations and entities. The system performs Representation Learning on this KG to get the embeddings of all entities and get the top related diseases, chemicals, and proteins with respect to COVID-19.


Subject(s)
COVID-19 , Pattern Recognition, Automated , Humans , Data Mining/methods
15.
15th International Conference on Advanced Computer Theory and Engineering, ICACTE 2022 ; : 78-82, 2022.
Article in English | Scopus | ID: covidwho-2161397

ABSTRACT

The world was put in disarray when the novel coronavirus first began. Furthermore, when the World Health Organization (WHO) declared the novel coronavirus outbreak a public health emergency of international concern (PHEIC), people prepared safety protocols to minimize the effect of the virus. One of these is the implementation of e-learning in countries, including the Philippines. As this contactless learning began, students' motivation decreased due to a lack of private space/classroom and face-to-face communication with their teachers. Learners' motivation is as crucial as this influences their pace to learn. The researchers developed a tool to help students with their studies and motivate them. LINYA is a web-based text annotation tool in machine learning. The tool was developed using an NLP method in machine learning. The researchers used automated Agile testing with four phases in testing the web tool. It began with component testing and progressed to integration, system, and acceptance testing. Based on the results from simulated data, the tests showed favorable results, with mean scores ranging from 3.8 to 4.6, for all areas of a usability test. It further shows that the developed system is ready for implementation. © 2022 IEEE.

16.
J Med Internet Res ; 24(11): e34067, 2022 11 02.
Article in English | MEDLINE | ID: covidwho-2098982

ABSTRACT

BACKGROUND: Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient. OBJECTIVE: We aimed to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of the literature. Further imminent themes can be predicted using machine learning on the evolving associations between words. METHODS: Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the World Health Organization database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month's literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month's network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. RESULTS: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. CONCLUSIONS: Machine learning-based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time.


Subject(s)
COVID-19 , Humans , Machine Learning , Semantics , Supervised Machine Learning , Post-Acute COVID-19 Syndrome
17.
Ieee Access ; 10:104156-104168, 2022.
Article in English | Web of Science | ID: covidwho-2070271

ABSTRACT

The named entity recognition based on the epidemiological investigation of information on COVID-19 can help analyze the source and route of transmission of the epidemic to control the spread of the epidemic better. Therefore, this paper proposes a Chinese named entity recognition model BERT-BiLSTM-IDCNN-ELU-CRF (BBIEC) based on the epidemiological investigation of information on COVID-19 of the BERT pre-training model. The model first processes the unlabeled epidemiological investigation of information on COVID-19 into the character-level corpus and annotates it with artificial entities according to the BIOES character-level labeling system and then uses the BERT pre-training model to obtain the word vector with position information;then, through the bidirectional long-short term memory neural network (BiLSTM) and the improved iterated dilated convolutional neural network (IDCNN) extract global context and local features from the generated word vectors and concatenate them serially;output all possible label sequences to the conditional random field (CRF);finally pass the condition random The airport decodes and generates the entity tag sequence. The experimental results show that the model is better than other traditional models in recognizing the entity of the epidemiological investigation of information on COVID-19.

18.
Procedia Comput Sci ; 205: 117-126, 2022.
Article in English | MEDLINE | ID: covidwho-2042094

ABSTRACT

This paper outlines the development and use of a tool suite developed by the NCI Agency to provide situational awareness and decision support during the current Covid-19. The tool suite was developed to understand how Covid-19 could impact the provision of communication and information services (CIS) to NATO, and so understand where risks to NATO operational functions might occur. The tool suite combines open source data on instances of Covid-19 globally along with internal information about the impact of Covid-19 on NCI Agency staff and the services they deliver to the NATO enterprise. It supports business impact assessments due to Covid-19; showing trends, age demographics, and providing early indications of critical services that may be affected, sites that may be affected, etc. The tool suite is an example of data science techniques supporting data driven decision making within a military organization.

19.
J Cheminform ; 14(1): 55, 2022 Aug 13.
Article in English | MEDLINE | ID: covidwho-1993381

ABSTRACT

MOTIVATION: Application of chemical named entity recognition (CNER) algorithms allows retrieval of information from texts about chemical compound identifiers and creates associations with physical-chemical properties and biological activities. Scientific texts represent low-formalized sources of information. Most methods aimed at CNER are based on machine learning approaches, including conditional random fields and deep neural networks. In general, most machine learning approaches require either vector or sparse word representation of texts. Chemical named entities (CNEs) constitute only a small fraction of the whole text, and the datasets used for training are highly imbalanced. METHODS AND RESULTS: We propose a new method for extracting CNEs from texts based on the naïve Bayes classifier combined with specially developed filters. In contrast to the earlier developed CNER methods, our approach uses the representation of the data as a set of fragments of text (FoTs) with the subsequent preparati`on of a set of multi-n-grams (sequences from one to n symbols) for each FoT. Our approach may provide the recognition of novel CNEs. For CHEMDNER corpus, the values of the sensitivity (recall) was 0.95, precision was 0.74, specificity was 0.88, and balanced accuracy was 0.92 based on five-fold cross validation. We applied the developed algorithm to the extracted CNEs of potential Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro) inhibitors. A set of CNEs corresponding to the chemical substances evaluated in the biochemical assays used for the discovery of Mpro inhibitors was retrieved. Manual analysis of the appropriate texts showed that CNEs of potential SARS-CoV-2 Mpro inhibitors were successfully identified by our method. CONCLUSION: The obtained results show that the proposed method can be used for filtering out words that are not related to CNEs; therefore, it can be successfully applied to the extraction of CNEs for the purposes of cheminformatics and medicinal chemistry.

20.
13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022 ; 3127:108-117, 2022.
Article in English | Scopus | ID: covidwho-1823711

ABSTRACT

Emergence of the Coronavirus 2019 Disease has highlighted further the need for timely support for clinicians as they manage severely ill patients. We combine Semantic Web technologies with Deep Learning for Natural Language Processing with the aim of converting human-readable best evidence/ practice for COVID-19 into that which is computer-interpretable. We present the results of experiments with 1212 clinical ideas (medical terms and expressions) from two UK national healthcare services specialty guides for COVID-19 and three versions of two BMJ Best Practice documents for COVID-19. The paper seeks to recognise and categorise clinical ideas, performing a Named Entity Recognition (NER) task, with an ontology providing extra terms as context and describing the intended meaning of categories understandable by clinicians. The paper investigates: 1) the performance of classical NER using MetaMap versus NER with fine-tuned BERT models;2) the integration of both NER approaches using a lightweight ontology developed in close collaboration with senior doctors;and 3) the easy interpretation by junior doctors of the main classes from the ontology once populated with NER results. We report the NER performance and the observed agreement for human audits. Copyright © 2022 for this paper by its authors.

SELECTION OF CITATIONS
SEARCH DETAIL